Some Novel Heuristics for Finding the Most Unusual Time Series Subsequences

نویسندگان

  • Mai Thai Son
  • Duong Tuan Anh
چکیده

In this work, we introduce some novel heuristics which can enhance the efficiency of the Heuristic Discord Discovery (HDD) algorithm proposed by Keogh et al. for finding most unusual time series subsequences, called time series discords. Our new heuristics consist of a new discord measure function which helps to set up a range of alternative good orderings for the outer loop in the HDD algorithm and a branch-and-bound search mechanism that is carried out in the inner loop of the algorithm. Through extensive experiments on a variety of diverse datasets, our scheme is shown to have better performance than previous schemes, namely HOT SAX and WAT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM

Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...

متن کامل

Finding the Unusual Medical Time Series: Algorithms and Applications

In this work we introduce the new problem of finding time series discords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors b...

متن کامل

Searching for repeats, as an example of using the generalised Ruzzo-Tompa algorithm to find optimal subsequences with gaps

Some biological sequences contain subsequences of unusual composition; e.g. some proteins contain DNA binding domains, transmembrane regions and charged regions, and some DNA sequences contain repeats. The linear-time Ruzzo-Tompa (RT) algorithm finds subsequences of unusual composition, using a sequence of scores as input and the corresponding 'maximal segments' as output. In principle, permitt...

متن کامل

Locating Motifs in Time-Series Data

Finding motifs in time-series is proposed to make clustering of time-series subsequences meaningful, because most existing algorithms of clustering time-series subsequences are reported meaningless in recent studies. The existing motif finding algorithms emphasize the efficiency at the expense of quality, in terms of the number of time-series subsequences in a motif and the total number of moti...

متن کامل

Similar Subsequence Search in Time Series Databases

Finding matching subsequences in time series data is an important problem. The classical approach to search for matching subsequences has been on the principle of exhaustive search, where all possible candidates are generated and evaluated or all the terms of the time series in a data base are examined. As a result most of the subsequence search algorithms are cubic in nature with few algorithm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010